Kernel functions in support vector machines (SVM) are needed to assess thesimilarity of input samples in order to classify these samples, for instance.Besides standard kernels such as Gaussian (i.e., radial basis function, RBF) orpolynomial kernels, there are also specific kernels tailored to considerstructure in the data for similarity assessment. In this article, we willcapture structure in data by means of probabilistic mixture density models, forexample Gaussian mixtures in the case of real-valued input spaces. From thedistance measures that are inherently contained in these models, e.g.,Mahalanobis distances in the case of Gaussian mixtures, we derive a new kernel,the responsibility weighted Mahalanobis (RWM) kernel. Basically, this kernelemphasizes the influence of model components from which any two samples thatare compared are assumed to originate (that is, the "responsible" modelcomponents). We will see that this kernel outperforms the RBF kernel and otherkernels capturing structure in data (such as the LAP kernel in Laplacian SVM)in many applications where partially labeled data are available, i.e., forsemi-supervised training of SVM. Other key advantages are that the RWM kernelcan easily be used with standard SVM implementations and training algorithmssuch as sequential minimal optimization, and heuristics known for theparametrization of RBF kernels in a C-SVM can easily be transferred to this newkernel. Properties of the RWM kernel are demonstrated with 20 benchmark datasets and an increasing percentage of labeled samples in the training data.
展开▼